AITopics | training risk

Collaborating Authors

training risk

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Over-parameterised Shallow Neural Networks with Asymmetrical Node Scaling: Global Convergence Guarantees and Feature Learning

Caron, Francois, Ayed, Fadhel, Jung, Paul, Lee, Hoil, Lee, Juho, Yang, Hongseok

arXiv.org Artificial IntelligenceFeb-2-2023

We consider the optimisation of large and shallow neural networks via gradient flow, where the output of each hidden node is scaled by some positive parameter. We focus on the case where the node scalings are non-identical, differing from the classical Neural Tangent Kernel (NTK) parameterisation. We prove that, for large neural networks, with high probability, gradient flow converges to a global minimum AND can learn features, unlike in the NTK regime. We also provide experiments on synthetic and real-world datasets illustrating our theoretical results and showing the benefit of such scaling in terms of pruning and transfer learning.

artificial intelligence, machine learning, neural network, (18 more...)

arXiv.org Artificial Intelligence

2302.01002

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
Asia > South Korea > Daejeon > Daejeon (0.04)
North America > United States > Texas (0.04)
Europe > France > Île-de-France > Paris > Paris (0.04)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

r/MachineLearning - [R] Out-of-Distribution Generalization via Risk Extrapolation (REx)

#artificialintelligenceMar-14-2020, 05:55:46 GMT

Abstract: Generalizing outside of the training distribution is an open challenge for current machine learning systems. A weak form of out-of- distribution (OoD) generalization is the ability to successfully interpolate between multiple observed distributions. One way to achieve this is through robust optimization, which seeks to minimize the worst-case risk over convex combinations of the training distributions. However, a much stronger form of OoD generalization is the ability of models to extrapolate beyond the distributions observed during training. In pursuit of strong OoD generalization, we introduce the principle of Risk Extrapolation (REx). REx can be viewed as encouraging robustness over affine combinations of training risks, by encouraging strict equality between training risks.

machinelearning, out-of-distribution generalization, risk extrapolation, (4 more...)

#artificialintelligence

Industry: Media > News (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Communications > Social Media (0.76)

Add feedback

Out-of-Distribution Generalization via Risk Extrapolation (REx)

Krueger, David, Caballero, Ethan, Jacobsen, Joern-Henrik, Zhang, Amy, Binas, Jonathan, Priol, Remi Le, Courville, Aaron

arXiv.org Artificial IntelligenceMar-2-2020

Generalizing outside of the training distribution is an open challenge for current machine learning systems. A weak form of out-of-distribution (OoD) generalization is the ability to successfully interpolate between multiple observed distributions. One way to achieve this is through robust optimization, which seeks to minimize the worst-case risk over convex combinations of the training distributions. However, a much stronger form of OoD generalization is the ability of models to extrapolate beyond the distributions observed during training. In pursuit of strong OoD generalization, we introduce the principle of Risk Extrapolation (REx). REx can be viewed as encouraging robustness over affine combinations of training risks, by encouraging strict equality between training risks. We show conceptually how this principle enables extrapolation, and demonstrate the effectiveness and scalability of instantiations of REx on various OoD generalization tasks. Our code can be found at https://github.com/capybaralet/REx_code_release.

generalization, out-of-distribution generalization, training risk, (14 more...)

arXiv.org Artificial Intelligence

2003.00688

Country:

North America > Canada > Quebec > Montreal (0.14)
North America > United States > New Mexico > Bernalillo County > Albuquerque (0.04)
North America > United States > Wisconsin > Dane County > Madison (0.04)
(5 more...)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Identity Connections in Residual Nets Improve Noise Stability

Yu, Shuzhi, Tomasi, Carlo

arXiv.org Machine LearningMay-26-2019

Residual Neural Networks (ResNets) achieve state-of-the-art performance in many computer vision problems. Compared to plain networks without residual connections (PlnNets), ResNets train faster, generalize better, and suffer less from the so-called degradation problem. We introduce simplified (but still nonlinear) versions of ResNets and PlnNets for which these discrepancies still hold, although to a lesser degree. We establish a 1-1 mapping between simplified ResNets and simplified PlnNets, and show that they are exactly equivalent to each other in expressive power for the same computational complexity. We conjecture that ResNets generalize better because they have better noise stability, and empirically support it for both simplified and fully-fledged networks.

artificial intelligence, machine learning, residual network, (16 more...)

arXiv.org Machine Learning

1905.10944

Country:

Europe > Sweden > Stockholm > Stockholm (0.04)
North America > United States > North Carolina > Durham County > Durham (0.04)

Genre: Research Report (1.00)

Industry: Health & Medicine > Therapeutic Area (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

Reconciling modern machine learning and the bias-variance trade-off

Belkin, Mikhail, Hsu, Daniel, Ma, Siyuan, Mandal, Soumik

arXiv.org Machine LearningDec-28-2018

The question of generalization in machine learning---how algorithms are able to learn predictors from a training sample to make accurate predictions out-of-sample---is revisited in light of the recent breakthroughs in modern machine learning technology. The classical approach to understanding generalization is based on bias-variance trade-offs, where model complexity is carefully calibrated so that the fit on the training sample reflects performance out-of-sample. However, it is now common practice to fit highly complex models like deep neural networks to data with (nearly) zero training error, and yet these interpolating predictors are observed to have good out-of-sample accuracy even for noisy data. How can the classical understanding of generalization be reconciled with these observations from modern machine learning practice? In this paper, we bridge the two regimes by exhibiting a new "double descent" risk curve that extends the traditional U-shaped bias-variance curve beyond the point of interpolation. Specifically, the curve shows that as soon as the model complexity is high enough to achieve interpolation on the training sample---a point that we call the "interpolation threshold"---the risk of suitably chosen interpolating predictors from these models can, in fact, be decreasing as the model complexity increases, often below the risk achieved using non-interpolating models. The double descent risk curve is demonstrated for a broad range of models, including neural networks and random forests, and a mechanism for producing this behavior is posited.

interpolation threshold, neural network, risk curve, (12 more...)

arXiv.org Machine Learning

1812.11118

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > Ohio > Franklin County > Columbus (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Add feedback